Search CORE

144 research outputs found

Improving Japanese Zero Pronoun Resolution by Global Word Sense Disambiguation

Author: Daisuke Kawahara
Sadao Kurohashi
Publication venue
Publication date: 01/01/2004
Field of study

This paper proposes unsupervised word sense disambiguation based on automatically constructed case frames and its incorporation into our zero pronoun resolution system. The word sense disambiguation is applied to verbs and nouns. We consider that case frames define verb senses and semantic features in a thesaurus define noun senses, respectively, and perform sense disambiguation by selecting them based on case analysis. In addition, according to the one sense per discourse heuristic, the word sense disambiguation results are cached and applied globally to the subsequent words. We integrated this global word sense disambiguation into our zero pronoun resolution system, and conducted experiments of zero pronoun resolution on two different domain corpora. Both of the experimental results indicated the effectiveness of our approach.

CiteSeerX

Crossref

Flexibly Focusing on Supporting Facts, Using Bridge Links, and Jointly Training Specialized Modules for Multi-hop Question Answering

Author: Alkhaldi Tareq
Chu Chenhui
Kurohashi Sadao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2021
Field of study

With the help of the detailed annotated question answering dataset HotpotQA, recent question answering models are trained to justify their predicted answers with supporting facts from context documents. Some related works train the same model to find supporting facts and answers jointly without having specialized models for each task. The others train separate models for each task, but do not use supporting facts effectively to find the answer; they either use only the predicted sentences and ignore the remaining context, or do not use them at all. Furthermore, while complex graph-based models consider the bridge/connection between documents in the multi-hop setting, simple BERT-based models usually drop it. We propose FlexibleFocusedReader (FFReader), a model that 1) Flexibly focuses on predicted supporting facts (SFs) without ignoring the important remaining context, 2) Focuses on the bridge between documents, despite not using graph architectures, and 3) Jointly learns predicting SFs and answering with two specialized models. Our model achieves consistent improvement over the baseline. In particular, we find that flexibly focusing on SFs is important, rather than ignoring remaining context or not using SFs at all for finding the answer. We also find that tagging the entity that links the documents at hand is very beneficial. Finally, we show that joint training is crucial for FFReader

Kyoto University Research Information Repository

Fertilization of case frame dictionary for robust Japanese case analysis

Author: Daisuke Kawahara
Sadao Kurohashi
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2002
Field of study

This paper proposes a method of fertilizing a Japanese case frame dictionary to handle com-plicated expressions: double nominative sen-tences, non-gapping relation of relative clauses, and case change. Our method is divided into two stages. In the first stage, we parse a large corpus and construct a Japanese case frame dic-tionary automatically from the parse results. In the second stage, we apply case analysis to the large corpus utilizing the constructed case frame dictionary, and upgrade the case frame dictio-nary by incorporating newly acquired informa-tion.

CiteSeerX

Crossref

Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese--Japanese Wikipedia

Author: Chu Chenhui
Kurohashi Sadao
Nakazawa Toshiaki
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2016
Field of study

Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract either parallel sentences or fragments from them for SMT. In this article, we propose an integrated system to extract both parallel sentences and fragments from comparable corpora. We first apply parallel sentence extraction to identify parallel sentences from comparable sentences. We then extract parallel fragments from the comparable sentences. Parallel sentence extraction is based on a parallel sentence candidate filter and classifier for parallel sentence identification. We improve it by proposing a novel filtering strategy and three novel feature sets for classification. Previous studies have found it difficult to accurately extract parallel fragments from comparable sentences. We propose an accurate parallel fragment extraction method that uses an alignment model to locate the parallel fragment candidates and an accurate lexicon-based filter to identify the truly parallel fragments. A case study on the Chinese--Japanese Wikipedia indicates that our proposed methods outperform previously proposed methods, and the parallel data extracted by our system significantly improves SMT performance

Kyoto University Research Information Repository

Building a Diverse Document Leads Corpus Annotated with Semantic Relations

Author: Hangyo Masatsugu
Kawahara Daisuke
Kurohashi Sadao
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository